Learning in high dimensions with projected linear discriminants
نویسنده
چکیده
The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as ‘the curse of dimensionality’, which can confound practitioners’ desires to build good models of the world from these data. When the dimensionality is very large, low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained. However, little is currently known about the effect of such dimensionality reduction on classifier performance. In this thesis we develop theory quantifying the effect of random projection – a recent, very promising, non-adaptive dimensionality reduction technique – on the classification performance of Fisher’s Linear Discriminant (FLD), a successful and widely-used linear classifier. We tackle the issues associated with small sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well. Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier.
منابع مشابه
Candidates for Synergies: Linear Discriminants versus Principal Components
Movement primitives or synergies have been extracted from human hand movements using several matrix factorization, dimensionality reduction, and classification methods. Principal component analysis (PCA) is widely used to obtain the first few significant eigenvectors of covariance that explain most of the variance of the data. Linear discriminant analysis (LDA) is also used as a supervised lear...
متن کاملClassification and Reductio-ad-Absurdurn
Proofs for the optimality of classification in real-world machine learning situations are constructed. The validity of each proof requires reasoning about the probability of certain subsets of feature vectors. It is shown that linear discriminants classify by making the least demanding assumptions on the values of these probabilities. This enables measuring the confidence of classification by l...
متن کاملDiscriminating Traces with Time
What properties about the internals of a program explain the possible di↵erences in its overall running time for di↵erent inputs? In this paper, we propose a formal framework for considering this question we dub trace-set discrimination. We show that even though the algorithmic problem of computing maximum likelihood discriminants is NP-hard, approaches based on integer linear programming (ILP)...
متن کاملRandom Projections as Regularizers: Learning a Linear Discriminant Ensemble from Fewer Observations than Dimensions
We examine the performance of an ensemble of randomly-projected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. Our ensemble is learned from a sequence of randomly-projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compr...
متن کاملBoosted Dyadic Kernel Discriminants
We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of Ad...
متن کامل